Algorithms for Next-Generation Sequencing Data by Mourad Elloumi

Algorithms for Next-Generation Sequencing Data by Mourad Elloumi

Author:Mourad Elloumi
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


8.3.4 Whole-Genome Ends-Repair

The Illumina WGBS protocol includes an end-repair step after the sonication of the genome. To maintain the double-stranded DNA, the potential overhanging ends are filled up with complementary unmethylated nucleotides, which might result in an underestimation of the methylation at those positions. This kind of bias at the end of the reads could be included under the term M-bias [24], as it should be treated differently from other sequence contaminations; M-bias detection and management will be reviewed below (see Sect. 8.7.3).

Even if all the sequence contaminations reviewed can introduce biases in methylation calling steps, the methylation biases depend on the presence of thymines and/or cytosines in the added artificial sequence. Nevertheless, this is not the case with the nucleotides coming from end repair (Sects. 8.3.3 and 8.3.4), which must be taken into special consideration. These nucleotides maintain the sequence information but not the original methylation status, and they then might be aligned perfectly in a methylation context, introducing a large bias in the methylation levels of these positions.

In addition to the control steps included in some of the alignment and methylation calling programs, there are some suitable tools to identify and clip these sequences, for example, FastQC [25] to check the sequence contamination present in the sample and FASTX-Toolkit [26], Cutadapt [27] or Trimmomatic [28], among others, to trim the artificial sequences.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.